The Effect of Coaching on the Predictive Validity of Scholastic Aptitude Tests
نویسندگان
چکیده
The present study was designed to examine whether coaching affect predictive validity and fairness of scholastic aptitude tests. Two randomly allocated groups, coached and uncoached, were compared, and the results revealed that although coaching enhanced scores of the Israeli Psychometric Entrance Test by about 25% of a standard deviation, it did not affect predictive validity and did not create a prediction bias. The conclusions refutes claims that coaching reduces predictive validity and creates a bias against the uncoached examinees in predicting the criterion. The results are consistent with the idea that score improvement due to coaching does not result strictly from learning specific skills that are irrelevant to the criterion. COACHING AND PREDICTIVE VALIDITY 3 The Effect of Coaching on the Predictive Validity of Scholastic Aptitude Tests The question of whether intelligence and scholastic aptitude test scores could be affected by interventions has been extensively discussed (e.g., Bond, 1989; Brody, 1992; Caruzo, Taylor, & Detterman, 1982; Spitz, 1986). Until about twenty years ago, the commonly held view was that improvement due to coaching (In this paper the term “coaching” is used to refer to all types of test preparation) was very small. This view is clearly demonstrated by the following citation from an ETS publication: "The magnitude of the gains resulting from coaching vary slightly but they are always small regardless of the coaching method used or the differences in the student coached” (ETS, 1965, p. 4). Since the early seventies, many studies focusing on the effects of preparation on scholastic aptitude tests have been conducted. Recent meta-analyses of these studies (Messick and Jungeblut, 1981; Powers, 1993) demonstrated that scores on scholastic aptitude tests can be improved by focused preparation. The expected fluctuations in an examinee's score following several weeks of coaching, are generally small and the mean gain on the SAT (Scholastic Assessment Tests, which consists of a verbal and a mathematical section), according to these meta-analyses is approximately one fifth of a standard deviation (beyond the gain that would be expected as a result of retesting only, which is, according to Donlon, 1984, about one seventh of a standard deviation). Similar results were obtained in a study based on examinee feedback questionnaires for the Israeli Inter-University Psychometric Entrance Test (PET), which, like the SAT, consists of a mathematical and a verbal section as well as an additional section which tests command of English as a foreign language (Oren, 1993). On both the PET and the SAT, coaching was more effective for the mathematical section (about one fourth of a standard deviation) than for the verbal section (about one sixth of a standard COACHING AND PREDICTIVE VALIDITY 4 deviation). According to Messick and Jungeblut (1981) the improvement which resulted from the first 20 hours of coaching is about 20% of a standard deviation in the mathematical subtest, and about 12.5% of a standard deviation in the verbal subtest. The number of hours needed to double these gains is estimated as 120 in the mathematical subtest and 250 hours in the verbal subtest. Special preparation is particularly common for scholastic aptitude entrance exams to institutes of higher learning. For example, in the United States, according to Powers (1988), 11% of the SAT examinees in 1986-87 took coaching courses, and 41% used preparation books. In Israel, the number of examinees taking coaching courses for the PET has dramatically increased from 1% in 1984 (the first administration of PET), to 42% in 1990 and to 77% in 1996. In 1996, 90% used preparation books. (Allalouf, 1984; Stein, 1990; Arieli, 1996). Coaching involves three interrelated elements: (1) Acquiring familiarity with the test (i.e., getting acquainted with the test instructions, item types, time limits, and answer sheet format), which can be achieved by answering questions which are similar to the test questions under conditions which are as similar as possible to those encountered during the actual administration of the test, (2) Reviewing material which is relevant to the test’s contents, for example, learning mathematics when the test contain mathematical reasoning, and (3) Learning testwiseness (TW), which can be defined as: "subject capacity to utilize the characteristics and formats of the test and/or the test taking situation to receive a high score" (Millman, Bishop, & Abel, 1965, p. 707). Four TW strategies, independent of test content or purpose, have been identified by Millman et al. (1965): efficient use of the available time, error avoidance, guessing and deductive reasoning. COACHING AND PREDICTIVE VALIDITY 5 Many studies on coaching for scholastic aptitude tests have dealt with the SAT. These studies focused primarily on the effects of both commercial and noncommercial coaching on test scores. Many researchers, among them Messick and Jungeblut (1981), Anastasi (1981) and Bond (1989), have raised the question of the possible detrimental effects of coaching on test validity. Bond (1989, p. 440) wrote: “A continuing concern on the part of testing specialists, admissions officers, and others is that coaching, if highly effective, could adversely affect predictive validity and could, in fact, call into question the very concept of aptitude.” Surprisingly, however, few research efforts have been devoted to studying the effects of coaching on the validity and fairness of scholastic aptitude or intelligence tests. The earliest study dealing with the effect of coaching on predictive validity was conducted by Ortar (1960). The Triangle1 Test was administered to a group of 397 children aged 6-14 who were unfamiliar with it. The test consisted of three parts: The first part served as a baseline, the second part was used for coaching, and the third part of the test was administered immediately after the coaching was completed. The scores of the first and the third parts were used as predictors, and the criterion was based on teacher evaluation of scholastic aptitude. The results indicated that correlation with the criterion was significantly greater for the third part of the test than for the first part. Ortar’s (1960) explanation for the improved predictive validity was that since coaching is a learning process, the after-coaching scores better reflect learning ability. Bashi (1976) conducted a study with a similar design to the one used by Ortar (1960). The Raven Progressive Matrices (RPM) test was administered to 4,559 Israeli Arab students aged 10-14. The scores on achievement tests in Mathematics and Arabic, as well as the teachers’ evaluations of the students’ relative position in the class served as criteria. The test, COACHING AND PREDICTIVE VALIDITY 6 which was not familiar to the students, was administered twice, with a very short coaching period of about one hour in between. The mean gain following coaching was high and statistically significant (between one half and three quarters of a standard deviation). Results also showed small but statistically significant improvement in predicting the above mentioned criteria as a result of coaching. Marron (1965) studied the effects of a long-term coaching program for the SAT and for the College Board Achievement Tests on the validity of these tests for predicting freshman class standing at military academies and selective colleges. Mean Score gains were very high (about three quarters of a standard deviation). Marron found that in some of the preparatory programs, in which the mean gain due to coaching was higher than in others, coaching led to an overprediction of academic performance. Powers (1985) examined the effects of variations in the number of preparation hours on the predictive validity of the analytical section of the Graduate Record Examination (GRE). The self-reported grade averages of 5,107 undergraduates served as the "postdictive" criterion, and the preparation consisted solely of familiarization through self-testing. Powers concluded that: "preparation of the kind studied may enhance rather than impair test validity" (p. 189). Jones (1986) studied the effects of coaching on the predictive validity and bias of the Skilled Analysis section of the Medical College Admission Test (MCAT). The criterion used by Jones was whether or not a student experienced academic problems in medical school. He analyzed two groups of self-reported coached and uncoached students, each consisting of 2,127 subjects (it was not reported whether coaching improved MCAT scores). The findings indicated that coaching does not lead to an overprediction of students' subsequent medical school performance. COACHING AND PREDICTIVE VALIDITY 7 Baydar, (1990) using a simulation study, attempted to determine whether or not the decline in SAT validity (a decline of 8 percent in the years between 1976-1985) was related to changes in the percentage of coached examinees. Freshman Grade Point Average (FGPA) was used as the criterion and the simulation indicated that, at most, only ten percent of the decline in predictive validity could be explained by the increase in coaching density. In contrast to the concern raised by Bond (1989) that coaching could adversely affect predictive validity, most of the above mentioned studies indicated that coaching led to slight improvements in predictive validity of scholastic aptitude tests, while no consistent picture emerged regarding the question of whether these tests are biased against the uncoached examinees. However, the empirical studies suffer three problems: (1) Insufficient information in most of the studies about whether or not examinees actually underwent coaching and the intensity of the coaching; (2) The coaching in some of the studies consisted of only a few hours, and therefore cannot be compared with commercial courses which offer much more intensive practice; (3) In some of these studies, examinees were not randomly selected into the coaching programs, and no control group were used. This may explain some of the differences in the findings of these studies. In addition, most participants in the studies conducted 30 years ago were unfamiliar with the types of questions as well as with the test instructions, and therefore coaching had a relatively large impact on their scores. Today, most examinees who undergo coaching are already familiar with the test format prior to coaching. It should be noted also that some of the studies focused on intelligence tests rather than scholastic aptitude tests. Clearly, there is a need for an up-to-date, well-designed study which will shed more light on the effect of coaching on predictive validity and fairness of scholastic aptitude tests. COACHING AND PREDICTIVE VALIDITY 8 In addition to the question of the influence of coaching on predictive validity, there is also the question of bias which arises when examinees differ in the amount of coaching they have undergone. With the exception of Marron (1965) and Jones (1986), this matter was generally not dealt with in the studies mentioned above. This study was designed to examine the effect of coaching on predictive validity and fairness (or bias) of scholastic aptitude tests. Two main forms of test bias have been discussed in the literature (see Millsap, 1995). Measurement bias which refers to the relationship between the test and latent variables measured by it, and bias in prediction which refers to the relationship between the test and a relevant criterion. The present study is focused only on the second type of test bias, and to examine whether coaching creates bias in prediction, we adopted the definition of bias proposed by Cleary (1968), known as the regression model. In other words, we intend to examine whether the criterion scores of the uncoached group are systematically underpredicted by their test scores, relative to the coached group. The method proposed by Lautenshlager and Mendosa (1986), on the basis of the regression model, was applied in the present study to examine whether the test is biased against the uncoached group The findings should provide an empirically-based answer to the oft-heard public criticism of these tests, which is based on the belief that preparation improves scholastic aptitude tests scores significantly and therefore these tests cannot serve as valid predictive tools. Of course, if coaching does not impair predictive validity and fairness, it might actually be desirable. From the applied perspective, institutes that use aptitude tests for admissions purposes would be able to take into account the impact of coaching on predictive validity, as well as the test’s bias against uncoached applicants (if such bias is demonstrated).
منابع مشابه
Synthesis of Research on the Effects of Coaching for Aptitude and Admissions Tests
tudents are sometimes ,cry' critical of aptitude and admissions tests, which they say focus on trivia, ask tricky questions, and are not relevant to what is taught in schools. But such criticism is easy to dismiss. Who would expect students to be objective about tests that can limit their educational options? Recently, however, several educational researchers have joined students in criticizing...
متن کاملRelations among general intelligence (g), aptitude tests, and GPA: Linear effects dominate
a r t i c l e i n f o This research examined linear and nonlinear (quadratic) relations among general intelligence (g), aptitude tests (SAT, ACT, PSAT), and college GPAs. Test scores and GPAs were obtained from the National Longitudinal Survey of Youth (N = 1950) and the College Board Validity Study (N = 160670). Regressions estimated linear and quadrat-ic relations among g, based on the Armed ...
متن کاملThe Predictive Validity of Four Intelligence Tests for School Grades: A Small Sample Longitudinal Study
Intelligence is considered the strongest single predictor of scholastic achievement. However, little is known regarding the predictive validity of well-established intelligence tests for school grades. We analyzed the predictive validity of four widely used intelligence tests in German-speaking countries: The Intelligence and Development Scales (IDS), the Reynolds Intellectual Assessment Scales...
متن کاملAssessing the Validity and Reliability of Barrett’s Test of Aptitude for the Use among 9th Graders
Academic and occupational guidance of the 9th graders require a valid and reliable instrument. Given that Barrett Aptitude Test is widely available, and perhaps used, it was deemed necessary to assess its psychometric characteristics. To this end, a cluster sample of 830 ninth graders from among all such students during the 2017-2018 school year in cities of Saqqez and Bukan, was selected. They...
متن کاملThe effect of managerial coaching on nurses’ innovative behaviors: mediating psychological empowerment and role clarity (Case study: Nurses of a military hospital)
Background and Aim: Today, nurses with innovative behaviors are an asset of any hospital, since they can instigate improvement and progress. Managerial coaching can help employees increase their innovative behaviors and motivate them function better. The purpose of the present study was to investigate the effect of managerial coaching on innovative behaviors by mediating the role clarity and ps...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003